14 research outputs found
A BERT-based dual embedding model for Chinese idiom prediction
Chinese idioms are special fixed phrases usually derived from ancient
stories, whose meanings are oftentimes highly idiomatic and non-compositional.
The Chinese idiom prediction task is to select the correct idiom from a set of
candidate idioms given a context with a blank. We propose a BERT-based dual
embedding model to encode the contextual words as well as to learn dual
embeddings of the idioms. Specifically, we first match the embedding of each
candidate idiom with the hidden representation corresponding to the blank in
the context. We then match the embedding of each candidate idiom with the
hidden representations of all the tokens in the context thorough context
pooling. We further propose to use two separate idiom embeddings for the two
kinds of matching. Experiments on a recently released Chinese idiom cloze test
dataset show that our proposed method performs better than the existing state
of the art. Ablation experiments also show that both context pooling and dual
embedding contribute to the improvement of performance.Comment: COLING 202
Efficient organic solar cells enabled by simple non-fused electron donors with low synthetic complexity
Abstract Fusedâring electron donors boost the efficiency of organic solar cells (OSCs), but they suffer from high cost and low yield for their large synthetic complexity (SC > 30%). Herein, the authors develop a series of simple nonâfusedâring electron donors, PF1 and PF2, which alternately consist of furanâ3âcarboxylate and 2,2â˛âbithiophene. Note that PF1 and PF2 present very small SC of 9.7% for their inexpensive raw materials, facile synthesis, and high synthetic yield. Compared to their allâthiopheneâbackbone counterpart PTâE, two new polymers feature larger conjugated plane, resulting in higher hole mobility for them, especially a value up to â10 â4 cm 2 V â1 ¡s for PF2 with longer alkyl side chain. Meanwhile, PF1 and PF2 exhibit larger dielectric constant and deeper electronic energy level versus PTâE. Benefiting from the better physicochemical properties, the efficiencies of PF1â and PF2âbased devices are improved by â16.7% and â71.3% relative to that PTâEâbased devices, respectively. Furthermore, the optimized PF2âbased devices with introducing PC 71 BM as the third component deliver a higher efficiency of 12.40%. The work not only indicates that furanâ3âcarboxylate is a simple yet efficient building block for constructing nonâfusedâring polymers but also provides a promising electron donor PF2 for the lowâcost production of OSCs.A simple structure nonâfusedâring electron donor PF2 alternately consisting of furanâ3âcarboxylate and 2,2â˛âbithiophene presents very small synthetic complexity of 9.7% as well as low material cost of â19.0 $ g â1 . More importantly, PF2 delivers a high efficiency of 12.4% coupled with strong operational stability. imag
HiJoNLP at SemEval-2022 Task 2: Detecting Idiomaticity of Multiword Expressions using Multilingual Pretrained Language Models
This paper describes an approach to detect idiomaticity only from the
contextualized representation of a MWE over multilingual pretrained language
models. Our experiments find that larger models are usually more effective in
idiomaticity detection. However, using a higher layer of the model may not
guarantee a better performance. In multilingual scenarios, the convergence of
different languages are not consistent and rich-resource languages have big
advantages over other languages
Exploring and adapting Chinese GPT to pinyin input method
While GPT has become the de-facto method for text generation tasks, its
application to pinyin input method remains unexplored. In this work, we make
the first exploration to leverage Chinese GPT for pinyin input method. We find
that a frozen GPT achieves state-of-the-art performance on perfect pinyin.
However, the performance drops dramatically when the input includes abbreviated
pinyin. A reason is that an abbreviated pinyin can be mapped to many perfect
pinyin, which links to even larger number of Chinese characters. We mitigate
this issue with two strategies, including enriching the context with pinyin and
optimizing the training process to help distinguish homophones. To further
facilitate the evaluation of pinyin input method, we create a dataset
consisting of 270K instances from 15 domains. Results show that our approach
improves performance on abbreviated pinyin across all domains. Model analysis
demonstrates that both strategies contribute to the performance boost.Comment: To appear in ACL 202